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SOLUBLE POLYPEPTIDES WITH ACTIVITY OF THE NS3 SERINE 
PROTEASE OF HEPATITIS C VIRUS, AND PROCESS FOR THEIR 
PREPARATION AND ISOLATION 

nRSfTRTPTTQW 

5 The hepatitis C virus (HCV) is the main etiologic 

agent of non-A, non-B hepatitis (NANB) . It is estimated 
that HCV causes at least 90% of post-transfusional NANB 
viral hepatitis and 50% of sporadic NANB hepatitis. 
Although great progress has been made in the selection of 
10 blood donors and in the immunological characterisation of 
blood used for transfusions, there is still a high level 
of acute HCV infection among those receiving blood 
transfusions, resulting in one million or more infections 
every year throughout the world. Approximately 50% of HCV 
15 infected individuals develop cirrhosis of the liver 
within a period that can range from 5 to 40 years, and 
recent clinical studies suggest that there is a 
correlation between chronic HCV infection and the 
development of hepatocellular carcinoma. 
20 HCV is an enveloped virus containing an RNA positive 

genome of approximately 9.4 kb. This virus is a member of 
the Flaviviridae family, the other members of which are 
the pestiviruses and f laviviruses . 

The RNA genome of HCV has recently been sequenced. 
25 Comparison of sequences from the HCV genomes isolated in 
various parts of the world has shown that these sequences 
can be extremely heterogeneous. Most of the HCV genome is 
occupied by an open reading frame (ORF) that can vary 
between 9030 and 9099 nucleotides. This ORF codes for a 
30 single viral polyprotein, the length of which can 
obviously vary from 3010 to 3033 amino acids. During the 
virus infection cycle, the polyprotein is proteolytically 
processed into the individual gene products necessary for 
replication of the virus. 
35 The genes coding for HCV structural protein are 

located at the 5' end of the ORF, whereas the region 
coding for the non- structural proteins occupies the rest 
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of the ORF. The structural proteins consist of: C (core, 
21 kDa), El (envelope. gp37) and E2 (NS1, gpSl) . C x. a 
non-glycosilate protein of 21 kDa, which probably forms 
the viral nucleocapsid. The protein El is a glycoprotexn 
, of approximately 37 kDa and is believed to be a 
structural protein of the outer viral envelope. E2, 
another membrane glycoprotein of 61 kDa, is probably a 
second structural protein of the outer envelope of the 
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The non- structural region starts with NS2 ( P 24) , a 
hydrophobic protein of 24 kDa whose function is not 
known. NS3, a protein of 68 kDa which follows NS2 in the 
polyprotein. has two functional domains: a serine 
protease domain in the first 180 amino- terminal ammo 
acids and an RNA- dependent ATPase domain in the carboxy- 
terminal part . The gene region corresponding to NS4 codes 
for NS4A (p6). a membrane protein of 54 amino acids, and 
NS4B ( P 26) . The gene corresponding to NS5 codes for two 
proteins, NS5A (p56) and NS5B ( P 65> , of 56 and 65 kDa, 
respectively. Recently it has been shown that the NS5B 
region has an RNA dependent RNA-polymerase activity (1) . 

Various molecular biological studies indicate that 
the signal peptidase, a protease associated wxth the 
endoplasmic reticulum of the host cell, i* responsxble 
for proteolytic processing in the non- structural regxon 

_i_ r/Pi E1/E2 and E2/NS2 (2) . A 

that is to say the sxtes C/El, El/iw ana , 

first protease activity of HCV is responsxble for the 
cleavage between NS2 and NS3 . This activity is contaxned 
in a region comprising both a part of NS2 and the part of 
US 3 containing the serine protease, domain, but does not 
use the same catalytic mechanism (3) . On the contrary, 
the serine protease contained in the 180 amino acxds a. 
the amino-terminal of NS3 is responsible for cleavage at 
the junctions between NS3 and NS4A, between NS4A and 
NS4B between NS4B and NS5A, and between NS5A and NS5B 
(4-8) in particular it has been found that the cleavage 
produced by this serine protease leaves a residue of 
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cysteine or threonine on the amino- terminal side 
(position PI), and a residue of alanine or serine on the 
carboxy- terminal side (position PI') of the substrate (6, 
9) . Recently it has been shown that NS4A binds the N- 

5 terminal end of NS3 with its central hydrophobic portion, 
thereby increasing the proteolytic activity of KS3 in all 
the cleavage sites on the polyprotein (10-12) . 

Inhibition of the protease activity would therefore 
stop the proteolytic processing of the non-structural 

10 portion of the HCV polyprotein and, as a consequence, 
would prevent virus replication in infected cells . This 
sequence of events has been verified in a f lavivirus, 
homologous of the hepatitis C virus, which infects cells 
in culture. 

15 in this case it has been possible to show that 

genetic manipulation, producing a protease that is no 
longer capable of exerting its catalytic activity, 
abolishes the ability of the virus to replicate (13) . 
Furthermore it has been widely demonstrated, both in 

20 vitro and in clinical studies, that compounds capable of 
interfering with the activity of the HIV protease are 
capable of inhibiting the replication of this virus (14). 

Finally there is evidence of the fact that the NS5 
region of HCV, which as we have mentioned above has an 

25 RNA dependent RNA-polymerase activity, does not display 
this function except after processing by the NS3 
protease . 

Therefore a substance capable of interfering with 
the proteolytic activity associated with the NS3 protein, 

30 could be a new therapeutic agent. From this point of view 
detailed knowledge of the three-dimensional structure of 
the protease takes on a great deal of importance, as it 
would allow both a greater understanding of the 
biological phenomena in which it is involved, and the 

35 analysis, study and design of inhibitor molecules capable 
of interfering with the protease activity, * thus paving 
the way for the development of pharmaceutical 
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compositions suitable for treatment of hepatitis C. 
Nevertheless, determination of the structure both using 
NMR methods and X-ray crystallography, requires large 
amounts of soluble protein, and at the present time it is 
not possible to meet this request. In fact, although the 
simplest and most economical manner of obtaining large 
amounts of the desired polypeptide is expression of the 
corresponding gene in bacteria, and although there is a 
widespread availability of numerous eucaryotic promoters 
and methods for maximising the expression of heterologous 
genes in E. Coli. nevertheless an efficient production of 
the polypeptide in question, although necessary, might 
not be sufficient. Many recombinant proteins do not fold 
the polypeptide chain correctly when they are expressed 
in E. Coli. The result is the synthesis of polypeptides 
which are either degraded in the host cell, or are 
accumulated in an insoluble form in the so called 
inclusion bodies (15). Furthermore, in the case of 
extremely hydrophobic proteins, proteins of viral origin 
or proteins that are toxic for the bacterial cell (as is 
the case for certain proteases of viral origin) there are 
insurmountable difficulties in producing them in a 

native, soluble form. 

In the case of the NS3 serine protease of the 
hepatitis C virus, due to the conditions in which the 
protein is normally produced, it has not been possible to 
date to obtain in E . coli a native type, soluble protease 
in amounts sufficient to enable the study of the 
structural nature of this protein, which requires 
solutions containing a high millimolar concentration of 



the protein. ^ u 
It has now unexpectedly been found that t„< 
important limitations can be overcome by using the method 
according to the present invention. As will be seen from 
the following, this method is based on the unexpected 
discovery that the NS3 serine protease domain, in its 
native conformation, binds a Zn * ion. 
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Because, as mentioned above, the structure of the 
HCV NS3 protease is not yet known, a structural model of 
the protein was prepared, to be used as a guide during 
experiments. However, the similarity of the NS3 protease 
to other serine proteases of known structure is extremely 
low (less than 15%), which does not allow good alignment 
between sequences and as a result does not allow 
construction of a three-dimensional model based solely on 
homology. For this reason, the available serine protease 
structures were used to build a multiple alignment of the 
structurally conserved regions and to draw up in this way 
a profile with which the sequence of the NS3 protease 
could subsequently be aligned. In this way it was 
possible to build an approximate three-dimensional model 
15 of the HCV NS3 protease (9, 16). 

Recently, three new viruses responsible for human 
hepatitis have been discovered (17) . These new viruses, 
known as GBV-A, GBV-B and GBV-C, show a polyprotein 
organisation in common with that of HCV (18, 19) . From 
alignment of the region corresponding to NS3 in these 
three new viruses with that of various HCV serotypes, 
several preserved amino acids were identified. These 
residues comprise: the amino acids in the active site, 
some glycines and prolines (probably involved in 
stabilising the structure of the protein) and three 
cysteines and one histidine (figure 1) . In the model 
suggested by us for the NS3 protease these last four 
residues are found in a region of the molecule opposite 
the active site, in a close spatial relationship, and 
their relative position is such that it forms a binding 
site for a divalent metallic ion, such as for example the 

ion Zn 2 * (figure 2) . 

This observation was subsequently confirmed 
experimentally. In fact, as will be illustrated in 
greater detail in the examples, the HCV NS3 protease 
actually has a metal content equivalent to one mole of 
zinc to each mole of protein, and as is the case in other 
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proteins the zinc is necessary to enable the protein to 
take on its native structure and become catalytically 

active (20, 21) . 

The fact that the NS3 protease has a binding site 
i for a metal ion and that this binding site is so well 
preserved, even in viruses that are not phylogenetically 
close, opens the way to the study of antiviral 
therapeutic agents whose target site is this very region 
of the protein. In fact, in the case of another viral 
o protein that binds Zn 2 * ions, that is to say the HIV 
virus nucleocapsid, it has been possible to identify 
compounds that interfere selectively with the bond 
between the protein and the Zn" ions (22, 23) and it has 
also been seen that these compounds interfere with the 
15 viral infection of cells grown in culture medium. 

An object of the present invention is therefore to 
provide a method for high-yield expression, in a native 
form, that is to say as a protein containing a bivalent 
metallic ion, and in a highly soluble form of the HCV NS3 
20 protease using heterologous expression systems, such as 
E coli cells transformed using suitable genetic 
constructs and cultivated in a medium enriched with salts 
containing divalent metal ions. 

A further object of the present invention is to 
25 provide a general method allowing preparation and 
isolation in a native, pure and highly soluble form, of 
large amounts of polypeptides containing Zn . Co or 
Cd a \ with the protease activity of HCV NS3 . 

Furthermore, an additional object of the present 
30 invention is to provide a method that allows preparation 
and isolation in a native, pure and highly soluble form 
of large amounts of polypeptides with the protease 
activity of HCV NS3, which are at the same time marked 
using stable heavy isotopes such as 13 C or 15 N , as 
35 required for experiments to determine the three- 
dimensional structure of the protein using NMR. 
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Finally, the present invention provides new genetic 
constructs for the expression, in E. coli cells, of 
modified polypeptides with the protease activity of HCV 
NS3. having a high yield of the native and soluble form 
of the HCV NS3 protease. 

These and other objects are achieved using one or 
more of the embodiments of the present invention 

described below. 

In an embodiment of the invention a procedure is 
provided for obtaining production of the NS3 serine 
protease domain in its native form, that is to say 
containing a bivalent metal ion, which is necessary for 
the structural integrity of the protein. The innovation 
in the procedure consists in the addition to the culture 
medium in which the transformed bacterial cells are grown 
of compounds containing metals such as Zn, Co, Cd, Mn, 
Cu, Ni, Ag, Fe, Cr, Hg, Au, Pt, V. These compounds 
provide the culture medium with the ions required by the 
protein to take on its native structure. In this way the 
protein is found in its native, soluble form in the 
cytoplasm of the bacterial cells, instead of being held 
in the included bodies, from which it can only be 
obtained by applying difficult resolubilisation 

procedures . • 

In another embodiment of the invention, a procedure 
is provided that makes it possible to replace the zinc 
ion in the protease, which is spectroscopically silent, 
with other ions (for example Co 2 * or Cd 2 *) , which are 
spectroscopically active, so as to permit the study of 
possible inhibitors capable of co-ordinating the metal 
contained in the protein and therefore of disturbing the 
bond between the protein and the metal. 

In - another embodiment of the invention, the addition 
of bivalent metal ions to a minimum culture medium, 
35 containing glucose and ammonium salts enriched with 13C 
or 15N as the sole sources of carbon and nitrogen, 
respectively, makes it possible to obtain large amounts 
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of soluble protein marked with stable heavy isotopes such 
as 13C or 15H. This type of isotope enrichment is 

, *- n determine the structure using NMR 
necessary to aetermme 

techniques . . 
5 in a further embodiment of the present invention 

polypeptide sequences are provided that contain the NS3 
ZZe protease domain of hepatitis C virus suitably 
m odified. These polypeptides are characterised in that 
they have at their C- terminal end a sequence of extremely 
10 hydrophilic amino acids, such as for example a series of 
lysines, which are not present in the original sequence 
By using this other new method there is a substantial 
improvement in terms of solubility and integrity of the 
protein produced. These modified protease molecules are 
,5 also to be considered as a subject of the present 
invention . 

Subjects of the present invention are therefore: 

a) isolated and purified polypeptides containing the 
HCV NS3 serine protease domain, characterised in that 

20 they have at their C-terminal end a tail of at least 

three lysines . . 

b) A process for the preparation of polypeptides 

containing the HCV NS3 serine protease domain in a 
soluble form, of use for etymological experiments. 
25 determination of the three-dimensional structure of the 
enzyme both by means of NMR and using X-ray 
crystallography, comprising the following operations: 

- transformation of a prokaryotic host cell with an 
expression vector containing a DNA sequence coding for a 

30 polypeptide with the proteolytic activity of the HCV NS3 

protease; . al 

- growth of the prokaryotic host cell on a spec.-* 

culture medium containing Zn'* or alternatively salts of 
transition metals such as Co, Cd, Hn. Cu. Ni, Ag, Fe. Cr. 

35 Hg, Au, Pt, V; nro duce 

- expression of the DNA sequence required to produce 

the chosen polypeptide; 
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- purification of the polypeptide without having to 
resort to resolubilisation protocols, and without the 
need for renaturation of the protein from included 
bodies . 

c) A process for the renaturation in vitro of the 
above polypeptides, characterised in that it comprises 
the following operations: 

- transformation of a prokaryotic host cell with an 
expression vector containing a DNA sequence coding for a 
polypeptide with the proteolytic activity of HCV NS3 
protease ; 

- expression of the DNA sequence required to produce 

the chosen polypeptide; 

- purification of the denaturated and renaturated 
polypeptide of the protein using buffers containing Zn 2 * 
or alternatively salts of transition metals such as Co, 
Cd, Mn, Cu, Ni, Ag, Fe, Cr, Hg, Au, Pt, V. 

d) Expression vectors for the production of the 
polypeptides represented by the sequences SEQ ID NO:l to 
SEQ ID NO: 4 with the proteolytic activity of HCV NS3, 
comprising: a polynucleotide coding for one of said 
polypeptides; regulation, transcription and translation 
sequences, operating in said host cell, operationally 
bonded to said polynucleotide; and, optionally, a 

25 selectable marker, 

e) A prokaryotic cell transformed with an expression 
vector containing a DNA sequence coding for polypeptides 
with the proteolytic activity of the HCV NS3 protease, so 
as to allow said host cell to express the specific 
polypeptide which is coded in the chosen sequence. 

Figure 1 shows the alignment between the HCV NS3 
serine protease sequence and the viruses GBV-A, GBV-B and 
GBV-C/HGV (Hcv, Hga, Hgb, Hgc) , with the poliovirus (Pol) 
2A cysteine protease. Amino acids conserved in the HCV 
proteases and in the viruses GBV-A, GBV-B and GBV-C/HGV 
are shaded. The catalytic residues are underlined and 
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the residues that bind zinc are indicated using the 
symbol _ . 

Figure 2 shows a diagrammatic model of the NS3 
serine protease domain. In particular it shows the 
5 position within the structure of the amino acids involved 
in binding zinc (dark grey) and the catalytic triad 
(light grey) . 

Figure 3 shows the effect of the zinc ion on HCV NS3 
serine protease activity. 
10 Figure 4 shows the effects of the zinc ion on the 

production of HCV NS3 protease as a soluble protein in E. 
coli on a minimum culture medium. Column 2 refers to the 
results of the experiment carried out on the cells 
without inducing protease production ( - IPTG) . Columns 3, 
IS 4 and 5 indicate that in the absence of ZnCl 2 and 
following the induction of protease production (+IPTG) 
the protein remains locked in the insoluble portion 
(indicated by the abbreviation PT) . On the contrary, in 
the presence of ZnCl 2 the protease is found entirely in 
20 the soluble portion (indicated by the abbreviation SN) . 

Figure 5 shows the electronic spectrums of the HCV 
NS3 protease. Figure 5a shows the visible and near-UV 
spectrum of the Co 2 '-protease . Figure 5b shows the UV 
absorption spectrums of the Zn"-protease and of the Cd *- 
25 protease . 

Strains of E. coli DHl/p bacteria transformed with 
the plasmids pT7-7(Pro BK-as K4) , P T7-7 (Pro) -asK4) , pT7- 
7 (Pro H-asK4) and pT7-7 (Pro J8-asK4) and coding for the 
30 amino acid sequences SEQ ID H0:1. SEQ ID NO: 2, SEQ ID 
NO: 3 and SEQ ID NO:4, respectively, were deposited on 
August 8, 1996 with The National Collections of 
industrial and Marine Bacteria Ltd (NCIMB) , Aberdeen, 
Scotland, U.K., under access numbers NCIMB 40821, NCIMB 
35 40822, NCIMB 40823 and NCIMB 40824. respectively. 

Up to this point a general description has been 
given of the present invention. With the aid of the 
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following examples a more detailed description of 
specific embodiments of the invention will now be given, 
with the aim of clarifying the objects, characteristics, 
advantages and methods of application thereof. 
EXAMPLE 1 

EXPRESSION AND PURIFICATION OF POLYPEPTIDES WITH THE 
PROTEASE ACTIVITY OF HCV NS3 , IN THEIR NATIVE SOLUBLE 
FORM 

The plasmids pT7-7(Pro BK-asK4), pT7-7 (Pro H-asK4) , 
pT7-7(Pro J-asK4) and pT7-7(Pro J8-asK4) were constructed 
to allow expression in E. coli of polypeptides 
characterised in that they have a sequence chosen from 
the ones in the group from SEQ ID NO:l to SEQ ID NO:4. 
The polypeptides contain the NS3 protease domain of 
various HCV isolates (BK, H, J and J8, respectively) with 
the addition of a "tail" of four lysines at the C- 
terminal end. 

pT7-7 (Pro BK-asK4 ) contains the sequence for HCV-BK 
(EMBL data bank access number: M58335) between the 
nucleotides 3411 and 3950, cloned in the vector pT7-7. 

pT7-7 (Pro H-asK4) contains the sequence for HCV-H 
(EMBL data bank access number: M67463) between the 
nucleotides 3420 and 3959, cloned in the vector pT7-7. 

pT7-7 (Pro J-asK4) contains the sequence for HCV- J 
25 (EMBL data bank access number: D90208) between the 
nucleotides 3408 and 3947, cloned in the vector pT7-7. 

pT7-7 (Pro J8-asK4) contains the sequence for HCV-J8 
(EMBL data bank access number: D10988/D01221) between 
the nucleotides 3432 and 3971, cloned in the vector pT7- 
30 7 . 

The expression vector pT7-7 is a derivative of 
pBR322 which contains, in addition to the gene for 
lactamase and the replication origin of ColEl , the 
promotor and the ribosome binding site of the T7 
35 bacteriophage 010 gene (24). 

The fragments coding for the HCV NS3 protease were 
cloned downstream of the T7 bacteriophage 010 promoter, 
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in reading frame with the first ATG condon of the gene 10 
protein of phage T7 using methods known to the art. 

The cDNA fragment containing the sequence HCV-BK 
between nucleotides 3411 and 3950 was amplified by 
Polymerase Chain Reaction (PGR), using the 
oligonucleotides PROT(BK-K4)S (SEQ ID N0:5) and PROT (BK- 
K4)AS (SEQ ID NO:6) as primers. The cDNA fragment so 
obtained was digested with the restriction enzyme Ndel , 
and cloned in pT7-7, which was first linearised with the 
restriction enzymes Ndel and Smal. 

The cDNA fragment containing the sequence HCV-H 
between nucleotides 3420 and 3959 was amplified by PCR, 
using the oligonucleotides PROT(H-K4)S (SEQ ID N0:7) and 
PROT(H-K4)AS (SEQ ID NO:8) as primers. The cDNA fragment 
so obtained was digested with the restriction enzymes 
Ndel and EcoRI. and cloned in P T7-7, which was first 
linearised with the same restriction enzymes. 

The cDNA fragment containing the sequence HCV-J 
between nucleotides 3408 and 3947 was amplified by PCR, 
using the oligonucleotides PROT(J-K4)S (SEQ ID NO:9) and 
PROT(J-K4)AS (SEQ ID N0:10) as primers. The cDNA 
fragment so obtained was digested with the restriction 
enzymes Ndel and EeoRI, and cloned in P T7-7, which was 
first linearised with the same restriction enzymes. 

The cDNA fragment containing the sequence HCV-J8 
between nucleotides 3432 and 3971 was amplified by PCR, 
using the oligonucleotides PROT(J8-K4)S (SEQ ID NO: 11) 
and PROT(J8-K4)AS (SEQ ID MO: 12) as primers. The cDNA 
fragment so obtained was digested with the restriction 
enzymes Ndel and ficoRX. and cloned in P T7-7, which was 
first linearised with the same restriction enzymes. 

The plasmids pT7-7(Pro BK-asK4) , pT7-7(Pro H-asK4) , 
P T7-7(Pro J-asK4) and P T7-7(Pro J8-asK4) containing NS3 

i ~ t-he aene for B- lactamase, which 

sequences also contain tne gene >- y 

l. . => a a election marker for E. coli cells 

35 can be used as a seiectiuu 

transformed with these plasmids. 



30 
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The fragments were cloned downstream of the T7 
bacteriophage promoter, in reading frame with the first 
ATG codon of the gene 10 protein of phage T7 using 
m ethods known to the art. The plasmids P T7-7(Pro BK- 
5 asK4), pT7-7(Pro H-asK4) , P T7-7(Pro J-asK4) and pT7-7(Pro 
J8-asK4) containing NS3 sequences also contain the gene 
for p-lactamase, which can be used as a selection marker 
for E . coli cells transformed with these plasmids. 

The plasmids are then transformed in the E. coli 
,0 strain BL21 (DE3) , normally used for high levels of 
expression of genes cloned in expression vectors 
containing the T7 promoter. In this strain the T7 
polymerase gene is carried into the bacteriophage X DE3, 
which is integrated into the chromosome of BL21 cells 
15 (25) . Expression of the gene is induced by incubating 
the cultures at an A600 nm of 0.7-0.9 with 0.4 mM of 
isopropyl-l-thio-P-D-galactopyranoside (IPTG) for 3 hours 
at 20°C in LB culture medium additioned with ZnCl2 at a 
concentration that can vary from 50 uM to 1 mM. After 
20 the three hours have passed the cells are harvested and 
washed in a saline phosphate buffer solution (20 mM 
sodium phosphate pH 7.5, 140 mM NaCl) , after which they 
are re-suspended in 25 mM sodium phosphate at P H 7.5, 10% 
glycerol, 500 mM NaCl, 10 mM DTT. 0.5% CHAP? (10 ml per 1 
,5 litre of culture medium) . The cells are then lysated by 
passing twice through a -French pressure cell" and the 
homogenate obtained in this way is centrif ugated at 
100,000X9 for 1 hour, while the nucleic acids are removed 
by precipitation with 0.5% polyethylenimine . The 
30 supernatants are loaded onto a HiLoad 16/10 SP Sepharose 
High Performance column (Pharmacia) , and balanced with 50 
mM of sodium phosphate at pH 7.d, o* - 
0 1% CHAPS (buffer A) . The column had been washed 
repeatedly with buffer A and the protease was eluted by 
35 applying a gradient of from 0 to 0.6 M NaCl. The 
fractions containing the protease were then collected and 
concentrated using a chamber for ultrafiltration under 
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magnetic stirring, equipped with a YM-10 membrane 
(Amicon) . The sample was then loaded onto an HR 26/60 
HiLoad Superdex 75 column (Pharmacia) , balanced with 
buffer A r operating at a flow rate of 1 ml/min. 

5 The fractions containing NS3 were collected and 

further purified on an HR 5/5 Mono S column (Pharmacia) , 
balanced with buffer B and operating at a flow rate of 1 
ml/min. The protease was eluted from the column in pure 
form applying a linear gradient of 0-0.6 NaCl in buffer 

10 A. 

After this passage the protein was preserved in 
stocks at concentrations of 50-150 jxM at a temperature of 
-80 °C after freezing in liquid nitrogen. The 
concentration of the protein was estimated by 
15 determination of absorbancy at 280 nm using a coefficient 
of extinction deriving from the sequence data or from 
quantitative amino acid analysis. Both methods come to 
the same results , with an error factor of 10* . The 
purity of the enzyme was ascertained on SDS 
20 polyacrylamide gel and by HPLC using an inverse phase 
Vydac C4 column (4.6x250 mm, 5 mm, 300 A). The eluents 
used were H2O/0.1% TFA (A) and acetonitryl/0 . 1% TFA (B) . 
A linear gradient of from 3% to 95% B over 60 minutes was 
used. Analysis of the N- terminal end was carried out 
25 using Edman degradation on a gaseous phase sequencer 
(Applied Biosystem model 470A) and the analysis by mass 
spectroscopy revealed that more than 96% of the purified 
protein has the N-terminal sequence PITAYSSQ. The 
remaining 3% has the sequence MAPITAYSSQ as foreseen from 
30 the data on the nucleotide sequence. 

In order to measure the enzymatic activity of the 
purified protein, a synthetic peptide of 13 ammo acids 
was used as a substrate. This peptide was derived from 
the cleavage sequence of the NS4A-NS4B junction 
35 (DEEMECSSHLPYK) . A peptide with 14 amino acids 

corresponding to the central hydrophobic region of the 
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protein NS4A (from position 21 to position 34) (Pep4A21- 
34: GSWIVGRIILSGR) was used as a protease cof actor. 

The peptides were synthesised by solid phase 
synthesis based on Fmoc chemistry. After washing and 
deprotection, the "raw" peptides were purified by HPLC to 
98% purity. The identity of the peptides was determined 
by mass spectrometry. The peptide solutions stored were 
prepared in DMSO and preserved at -80°C, furthermore the 
concentrations were determined by quantitative amino acid 
analysis carried out on samples hydrolysed with HC1 . 

The cleavage tests were carried out using 300 nM - 
1.6 uM of enzyme in 30 1 of 50 mM Tris pH 7.5, 50% 
glycerol, 2% CHAPS, 30 mM DTT and appropriate amounts of 
substrate and/or peptide -NS4 A at 22«C. The reaction was 
stopped by addition of 70ul of H20 containing 0.1% TFA. 
Cleavage of the peptide substrate was determined by HPLC 
using a Merck-Hitachi chromatograph . After this, 90ul of 
each sample were injected into an inverse phase 
Lichrospher C18 cartridge column (4x125 mm, 5um. Merck) 
and the fragments were separated using an acetonitryl 
gradient of 3-100% at 2%/min. Identification of the peak 
was achieved following both the absorbancy at 220 nm and 
the fluorescence of the tyrosine (Xex= 260 nm, Xem= 305 
nm) . 

Tables 1 and 2 give the data for solubility and 
yield relating to the NS3 protease corresponding to 
various HCV virus isolated. Table 1 gives the data for 
production of the various forms of protease both with and 
without the addition of four lysines at the C- terminal 
end, and both with and without the addition of ZnCl2 in 
the culture medium. The data are expressed as the 

percentage or protein recoveied xn uie 

of the cell extracts and the protein found in the 
included bodies. Table 2 gives the yields and solubility 
of the various forms of protease, purified from E. coll 
cells grown in the presence of ZnCl2 . As can be seen 
from the results given, the modified proteases (BK-ASK4 , 
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J-ASK4 H-ASK4) are between 10 and 20 times more soluble 
and when expressed in a culture medium containing an 
excess of ZnCl2, they give a yield up to 10 times greater 
than the respective proteases without the lysine tail . 



T"T,F. 1 
Construct 



Culture medium Soluble portion Included bodies 





Pro J 


LB 


5* 


10 


Pro J-asK4 


LB 


20% 




Pro J 


LB + 2nCl2 


99% 




Pro J-asK4 


LB + 2nCl2 


99% 




Pro H 


LB 


<2% 




Pro H-asK4 


LB 


3-4% 


15 


Pro H 


LB + 2nCl2 


5% 




pro H-asK4 


LB + ZnCl2 


50% 



95% 
80% 
<1% 
<1% 
>98% 
>95% 
95% 
50% 



TftBLE 2 

20 Construct 



25 



Yield (nvg/lt medium) Solubility (mg/ml) 



pro BK 


1-2 


Pro BK-asK4 


10-15 


Pro H 


0.1-0.2 


Pro H-asK4 


1-2 


Pro J 


1-2 


Pro J-asK4 


15-20 



1-2 
>40 
1-2 
>40 
0.5-1 
>10 




30 PTT TF ; ? MTTJATTON OF THE 

^ ?p nTET .vTTr ACTT VTTY nT? urv ^ PROTEASE 

The polypeptides purified ftttw.un^ v-w w**c — 
described in examples 1, 3 and 5 were further dialysed 
against buffers containing a chelating agent, in order to 
35 remove any metal ions bound to the protein, and their 
metal content was determined by atomic absorptxon 
spectrometry using a Perkin-Elmer Instrument 
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spectrometer. The glass equipment used for analysis of 
the metal content was washed using 30% nitric acid and 
rinsed completely with deionised water. The protease (at 
a concentration of 4 mg/ml) was dialysed for a period of 
at least 16 hours against a buffer containing '50 mM 
Tris/Hcl pH 7.5, 3 mM DTT, 10% glycerol, 0.1% CHAPS. A 
Chelex-100 resin (2.5 g/1) was held in suspension in the 
dialysis buffer to prevent contamination by casual metal 
ions. The protein was then hydrolysed with nitric acid 
and then used to determine the metal content. The 
standardised Zn 2 \ Co 2+ and Cd 2 * solutions were purchased 
from Merck. 

The metal content was found to be 1 g-atom per 1 
mole enzyme (see table 3 - n.d.- not determined), with 
the exception of of the apoprotein, which has a 
negligible metal content. 



20 



TftBLE 3 



Protein Zn (g-atoms/mole) Co (g- atoms /mole) Cd (g- atoms /mole) 



Zn 2+ -NS3 



Apo-NS3 
25 Co 2 *-NS3 
Cd 2+ -NS3 



1.09 
0.02 
0.19 
0.09 



n .d. 
n .d. 
0.90 

n.d. 



n.d 
n.d 
n.d 
1.15 



n 



. d.: not determined 



30 EXAMPLE 3 

ppnrFnnPF for the ren atttration OF THfi NS3 PKOTEASE I N 

ThF PRESENCE OF ZINC 

To ascertain whether or not zinc is required for HCV 
NS3 serine protease activity, its proteolytic activity 
35 was first measured on a synthetic substrate peptide. 
This measurement was carried out in the presence of 
increasing concentrations of EDTA or of 1,10- 
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phenanthroline. It was found that these two compounds do 
not inhibit proteolysis by NS3 at concentrations lower 
than 1 mM. Above these concentrations both EDTA and 
1, 10-phenanthroline only show a modest level of 
inhibition of NS3 activity. However a similar inhibition 
behaviour has been obtained in control experiments using 
structurally similar elements to 1, 10-phenanthroline, 
which is not capable of chelating zinc ions, and the 
activity was not re-obtained in the presence of an excess 
of Zn 2+ ions. These results suggest that either zinc is 
not required for enzymatic activity, or that it is so 
strongly bonded to the protein that it cannot be removed 
by treatment with chelating agents. It was therefore 
decided to proceed with preparation of a protein 
containing no zinc (apoprotein) and to measure its 
biochemical activity in the absence and in the presence 
of this metal. Bonded zinc cannot be removed by dialysis 
against chelators with a pH exceeding 7, whereas on the 
other hand prolonged dialysis of the enzyme at a pH of 
less than 5 and in the presence of 10 mM EDTA causes a 
loss of zinc accompanied by irreversible precipitation of 
the sample. The above observations suggest that the zinc 
is strongly bound and that it is essential for the 
structural integrity of the protein. -n order to 
facilitate the release of zinc the apoprotein was 
obtained by applying the following procedure: 1.7 mg of 
NS3 protease were denaturated by addition of TFA to a 
final concentration of 1%. The denaturated protein was 
then purified on a Resouce RPC 3 ml column using an 
acetonitryl gradient of from 0% to 85% in the presence of 
0.1% TFA. The flow rate of the column was equivalent to 
2 ml/min and the volume of the gradient was 4D ml . The 
zinc content of the apoprotein was found to be 
negligible. The enzymatic activity of the apoprotein was 
then tested in the presence and in the absence of zinc. 
The apoprotein was diluted to a final concentration of 60 
nM in the activity buffer containing the concentrations 
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o£ Zn Cl, shown in the iO JJ^"JZ£ 

period of 1 hour at 22 C concencration o£ ,o m. 

adding the substrate eptxde at a cone ^ 

s The reaction was then made to proceed i 

before taking the measurements. As shown „ figure 
^constitution of the enzymatic activity depends on the 
reconstiuuL buffer. Maximum 

concentration o£ zinc ions in the Duller 
conc j a zbC12 concentration of 25 

reactivation was observed at a ZnC12 co . 

,„ M „. At this concentration the ensymatic activity is 
found to be approximately 50% when compare , to ^ the 
Protease containing zinc .diluted in the —•*f« r ^ 
tbe same finai concentration). This experiment gives 
unequivocal proof that zinc is necessary - order f^r the 

, 5 enzyme to be structurally complete and active^ and « 
also provides a method for ^constitution of »S 3 serine 
protease activity starting from the apoprotein. 

,„ ^ 1 T T T- ngTRriiTlTTTT^>N of THE THRE E- 

^, r ;; r "JlT^f twin HHR JEQflilQIIES 
nTTlr i h e di s=overy that HCV NS3 protease 
structural zinc atom has been used to increase the 
production of soluble protein in bacterial cells J, 

25 coli) and therefore to produce a protein in a form that 
can be used for experiments aimed at determining the 
structure by means of NMR. 

structure r structure by means of 

In effect, determination o£ structure uy 

NMR involves metabolic marking with 15. and 13 c , to e 

30 curled out on a minimum culture medium, for example 

ied M9 ^ ^ <nh 4 ) 2 so 4 ^ 

"picUlin 5u 9 /ml. glucose * gA. FeSO. .7H,0 13 J- 0 . 
Auction in this culture medium, which does not -elude 
35 zinc salts in its composition, inevitably results in the 
production of insoluble protein, whereas the £ 
lo m of ZnCl, results in the production of a completely 
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soluble protease. In this way it is possible to produce 
a marked protein using ( 15 NH 4 ) 2 S0 4 as a source of nitrogen 
and 13 C-glucose as a source of carbon. 

Following this new procedure, a protein has been 

5 obtained that remains in a soluble form in the cytoplasm 
and is not captured by the inclusion bodies, as was the 
case using the old procedures. In this way the 
resolubilisation procedures become unnecessary, which 
results in considerable advantages, as these procedures 

10 have an extremely variable yield, require extremely 
controlled conditions and also frequently cause 
irreversible alterations in the protein. Figure 4 shows 
how the protease (at approximately 21 kDa - indicated in 
the figure by an arrow) is produced as an insoluble 

15 aggregate (PT) when the bacterial cells are grown in 
minimum culture medium without zinc (columns 3, 4 and 5) . 
On the contrary, if ZnCl 2 is added to the culture medium 
at a concentration of 50 mM the protein is found in the 
soluble fraction (SN) (columns 6, 7 and 8) and disappears 

20 from the insoluble fraction (PT) . 
EXAMPLE 5 

REPLACEMENT OF THE Zn 2 * BOUND TO NS3 WITH SPECTROSCOPIC 
PROBES SUCH AS Co 2 * OR Cd** 

The Zn 2+ binding site of the HCV NS3 protease and 
25 zinc can be studied by replacing the zinc with metals 
that make spectroscopic studies possible. The close 
binding of the structural zinc to the enzyme makes it 
difficult to remove the metal and replace it in vitro. 
As a result, the Zn 2 * was replaced by Co 2 * and Cd 2+ by 
30 incorporation in vivo. The bacterial cells (E. coli) 
were transformed with an appropriate expression vector 
and grown in minimum culture medium containing 100 mM 
potassium phosphate at pH 7.0, 0.5 mM MgS0 4 , 0.5 mM 
CaCl 2 , 13 \xM FeS0 4 , 7 |iM thiamine, 6 |iM biotin. Glucose 
35 (4 g/1) and (NH4)2S04 (1 g/1) were used as sources of 
carbon and nitrogen, respectively. To reduce the amount 
of Zn 2+ in the culture medium, the phosphate buffer was 
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made to pass through a Chelex-100 column. To obtain 
production of Co 2 * or Cd 2 *-NS3, 50 mM of CoCl2 and CdCl2 
were added, respectively, 20 minutes before addition of 
IPTG. Purification of the Co 2 * and Cd 2 *-proteases was 
5 obtained using the procedure described in example 1, 
except for the fact that all the buffers used were 
treated with chelex-100 resin (2.5 g/1) and the DTT was 

eliminated. 

The addition of CoCl2 or CdCl2 to the culture medium 
10 still results in production of the soluble enzyme, which 
indicates that the Co 2 * and Cd 2+ ions can replace zinc in 
the binding site for metal and protease. 

The protease containing Co 2 * and Cd 2 * was subjected 
to electronic absorption spectroscopic analysis . The 
15 protease containing Co 2 * shows a typical absorption 
spectrum in the visible region (figure 6a) , which 
indicates ■ a binding site with a tetrahedral geometry 
(26) . The two main bands at 640 nm and at 685 nm and the 
minimums at 585 nm and at 740 nm indicate d-d 
20 transitions. The energy in these transitions and the 
molar extinction coefficients are characteristic of 
complexes with a distorted tetrahedral co-ordination 
geometry (27). The d-d transition energy is consistent, 
with a mixed sulphur -nitrogen co-ordination bond. 
25 Furthermore, the centroide in the band corresponding to 
the d-d transition indicates a Co 2 * complex with a S3N 
bond (26). A typical charge transfer band S -> Co 2 * was 
observed at around 365 nm (figure 6a) , implying that the 
metal ion is co-ordinated by thiolates. 
30 In accordance with these data, the UV absorbancy 

spectrum of the Cd 2+ -protease (figure 6b) shows an 
increase in absorbancy at around 250 nm, 
probability is due to a charge transfer band S 
(28). In conclusion , spectroscopic analysis of the Co 
35 and Cd 2 *- proteases is completely consistent with the 
three-dimensional model proposed by us. In face, in the 
model the binding site for the metal is made up of three 



mW; ~u 4 _ nil 

-> Cd 2+ 

2 + 
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om 



thiole groups of three cysteines and of a nitrogen at 
from the side chain of a hystidine. Each of the residues 
that according to the model form the binding site for the 
metal has been changed to alanine and, as expected, none 
5 of the mutants obtained is capable of being expressed in 
a soluble form in E. coli. 
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SEQUENCE LISTING 

GENERAL INFORMATION 

(i) APPLICANT: 

ISTIITTTO DI RICERCHE DI BIOLOGIA MOLECOLARE P . ANGELETTI S.p.A. 
5 (ii) TITLE OF INVENTION : " SOLUBLE POLYPEPTIDES WITH ACTIVITY OF 
THE NS3 SERINE PROTEASE OF HEPATITIS C VIRUS , AND PROCESS FOR 
THEIR PREPARATION AND ISOLATION" 

(iii) NUMBER OF SEQUENCES: 12 

(iv) MAILING ADDRESS: 

10 (A) ADDRESSEE: Societa' Italiana Brevetti 

(B) STREET: Piazza di Pietra, 3 9 

(C) CITY: Rome 

(D) COUNTRY: Italy 

(E) POST CODE: 1-00186 
15 (v) COMPUTER -READABLE FORM: 

(A) TYPE OF SUPPORT: Floppy disk 3.5" 1.44 MBYTES 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS Rev. 5.0 

(D) SOFTWARE: Microsoft Word 6.0 

(viii) AGENT INFORMATION 

(A) NAME: DI CERBO Mario (Dr.) 

(B) REFERENCE: RM/X88878/PC-DC 

(ix) TELECOMMUNICATIONS INFORMATION 
(A) TELEPHONE: 06/6785941 

25 (B) TELEFAX: 06/6794692 

(C) TELEX: 612287 ROPAT 



20 



30 



(1) INFORMATION ON SEQUENCE SEQ ID NO:l 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 187 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

35 (ix) FEATURE 

(A) NAME: Pro BK-asK4 
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(D) OTHER INFORMATION: sequence for the NS3 
protease of HCV-isolated BK. 

<xi) SEQUENCE DESCRIPTION SEQ ID NO: 1: 
Met Ala Pro lie Thr Ala Tyr Ser Gin Gin Thr Arg Gly Leu Leu Gly 
5 1 5 10 15 

Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Glu Gly 

20 25 30 

Glu Val Gin Val Val Ser Thr Ala Thr Gin Ser Phe Leu Ala Thr Cys 
35 40 45 

10 Val Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Ser Lys Thr 
50 55 60 

Leu Ala Gly Pro Lys Gly Pro lie Thr Gin Met Tyr Thr Asn Val Asp 
65 70 75 80 

Gin Asp Leu Val Gly Trp Gin Ala Pro Pro Gly Ala Arg Ser Leu Thr 
15 85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 

100 105 110 

Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 
115 120 125 

20 Ser Pro Arg Pro Val Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
130 135 140 

Leu Cys Pro Ser Gly His Ala Val Gly lie Phe Arg Ala Ala Val Cys 
145 150 155 160 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe Val Pro Val Glu Ser Met 
25 1 65 1 70 175 

Glu Thr Thr Met Arg Ala Ser Lys Lys Lys Lys 

180 185 
(2) INFORMATION ON SEQUENCE SEQ ID NO: 2: 
(i) SEQUENCE CHARACTERISTICS 
30 (A) LENGTH: 187 amino acids 

(B) TYPE: amino acid 

(C) 3TRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE 

35 (A) NAME: Pro H-asK4 

(D) OTHER INFORMATION: sequence for the NS3 
protease of HCV-isolated H. 
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(xi) SEQUENCE DESCRIPTION SEQ ID NO: 2: 
Met Ala Pro lie Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu Leu Gly 
x 5 10 15 

Cys He lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Glu Gly 
5 20 25 30 

Glu Val Gin He Val Ser Thr Ala Thr Gin Thr Phe Leu Ala Thr Cys 

35 40 45 

He Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Thr Arg Thr 
50 55 60 

10 He Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr Thr Asn Val Asp 

65 70 75 80 

Gin Asp Leu Val Gly Trp Pro Ala Pro Gin Gly Ser Arg Ser Leu Thr 

85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 
15 100 105 HO 

Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 

115 120 125 

Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
130 135 140 

20 Leu Cys Pro Ala Gly His Ala Val Gly Leu Phe Arg Ala Ala Val Cys 
145 150 155 160 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe He Pro Val Glu Asn Leu 

165 170 175 

Glu Thr Thr Met Arg Ala Ser Lys Lys Lys Lys 
25 180 185 

(3) INFORMATION ON SEQUENCE SEQ ID NO: 3: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 187 amino acids 

(B) TYPE: amino acid 

30 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ix) FEATURE 

(A) NAME: ProJ-asK4 

(D) OTHER INFORMATION: sequence for the NS3 

35 protease of HCV- isolated J. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
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Met Ala Pro lie Thr Ala Tyr Ser Gin Gin Thr Arg Gly Leu Leu Gly 
1 5 10 15 

Cys lie He Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Asp Gly 

20 25 30 

5 Glu Val Gin Val Leu Ser Thr Ala Thr Gin Ser Phe Leu Ala Thr Cys 

35 40 45 

Val Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Ser Lys Thr 

50 55 60 

Leu Ala Gly Pro Lys Gly Pro He Thr Gin Met Tyr Thr Asn Val Asp 
10 65 70 75 80 

Gin Asp Leu Val Gly Trp Pro Ala Pro Pro Gly Ala Arg Ser Met Thr 

85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 

100 105 HO 

15 Asp Val Val Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 

115 120 125 

Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 

130 135 140 

Leu Cys Pro Ser Gly His Val Val Gly He Phe Arg Ala Ala Val Cys 
20 145 150 155 160 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe He Pro Val Glu Ser Met 

165 170 175 

Glu Thr Thr Met Arg Ala Ser Lys Lys Lys Lys 

180 185 
25 (4) INFORMATION ON SEQUENCE SEQ ID NO 4 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 186 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
30 (D) TOPOLOGY: linear 

(ix) FEATURE 

(A) NAME: Pro J8-asK4 

(D) OTHER INFORMATION: sequence for the NS3 

protease of HCV- isolated J8. 
35 ( X i) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Ala Pro He Thr Ala Tyr Thr Gin Gin Thr Arg Gly Leu Leu Gly Ala 
1 5 10 15 
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Ile Val val Ser Leu Thr Gly Arg Asp Lys Asn Glu Gin Ala Gly Gin 

Val Gin Val Leu Ser Ser Val Thr Gin Thr Phe Leu Gly Thr Ser lie 

5 35 40 45 

Ser Gly Val Leu Trp Thr Val Tyr His Gly Ala Gly Asn Lys Thr Leu 

cc 60 

50 55 

Ala Gly Pro Lys Gly Pro Val Thr Gin Met Tyr Thr Ser Ala Glu Gly 
65 70 75 BO 

10 Asp Leu Val Gly Trp Pro Ser Pro Pro Gly Thr Lys Ser Leu Asp Pro 

85 90 95 

Cys Thr Cys Gly Ala Val Asp Leu Tyr Leu Val Thr Arg Asn Ala Asp 

10 0 105 "0 

Val lie Pro Val Arg Arg Lys Asp Asp Arg Arg Gly Ala Leu Leu Ser 

l5 115 "0 125 

Pro Arg Pro Leu Ser Thr Leu Lys Gly Ser Ser Gly Gly Pro Val Leu 



130 



135 1*° 



Cys Ser Arg Gly His Ala Val Gly Leu Phe Arg Ala Ala Val Cys Ala 

, _ . ice 160 

145 150 
20 Arg Gly Val Ala Lys Ser lie Asp Phe He Pro Val Glu Ser Leu Asp 

165 "0 175 

Val Ala Thr Arg Ala Ser Lys Lys Lys Lys 

180 185 
( 5 ) INFORMATION ON SEQUENCE SEQ ID NO :. 5 : 
25 (i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: Synthetic DNA 

(iv) ANTI SENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotide synt»»ssn 

• (ix) FEATURE 

(A) NAME: PROT(BK-K4)S 
35 ( X i) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GCATACATAT GGCGCCCATC ACGGCC 26 
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(6) INFORMATION ON SEQUENCE SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 33 nucleotides 

(B) TYPE: nucleic acid 

5 (C) STRANDEDNESS : single 

(D) ASPECT: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: Yes 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
10 (ix) FURTHER CHARACTERISTICS 

(A) NAME: PROT (BK-K4 ) AS 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CTACTTCTTC TTCTTGCTAG CCCGCATAGT AGT 33 

15 

(7) INFORMATION ON SEQUENCE SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 nucleotides 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
25 (ix) FEATURE 

(A) NAME: PROT(H-K4)S 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GAGATACATA TGGCGCCTAT CACGGC 26 

(8) INFORMATION ON SEQUENCE SEQ ID NO : 8: 
30 (i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 42 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 (ii) MOLECULE TYPE: Synthetic DNA 

(iv) ANTISENSE: Yes 
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(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 

(ix) FEATURE 

(A) NAME: PR0T(H-K4)AS 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 8: 
TTTGAATTCC TACTTCTTCT TCTTGCTAGC TCTCATGGTT GT 42 

(9) INFORMATION ON SEQUENCE SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 27 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 

(ix) FEATURE 

(A) NAME: PROT(J-K4)S 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TTTCATATGG CGCCTATCAC GGCCTAT 27 

(10) INFORMATION ON SEQUENCE SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: Yes 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 

(ix) FEATURE 

(A) NAME: PROT(J-K4)AS 
(xi) SEQUENCE DESCRIPTION SEQ ID NO: 10: 
TTTGAATTCC TACTTCTTCT TCTTGCTAGC CCGCATGGTA GT 42 

(11) INFORMATION ON SEQUENCE SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 24 nucleotides 



20 



25 



30 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 
(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
(ix) FEATURE 

(A) NAME: PROT(JB-K4)S 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GGGAATTCCA TATGGCTCCC ATTACTGCT ACAC 24 



(12) INFORMATION ON SEQUENCE SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 42 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: Yes 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
(ix) FEATURE 

(A) NAME: PROT(J8-K4)S 
(xi) SEQUENCE DESCRIPTION SEQ ID NO: 12: 
TTTGAATTCC TACTTCTTCT TCTTGCTAGC CCGTGTGGCG AC 42 



SUBSTITUTE SHEET (RULE 26) 



WO 98/12308 



PCT/TT97/00228 



-33- 

CIAIMS 

1. Isolated and purified polypeptides containing the 
HCV NS3 serine protease domain sequence, characterised in 
that they have at their C- terminal end a tail of at least 
three lysines . 

5 2 . Expression vectors for the production of the 

polypeptides according to claim 1, characterised in that 
they comprise: a polynucleotide coding for one of said 
polypeptides; regulation and translation sequences 
functional in said host cell, operationally bonded to 

10 said polynucleotide; and, optionally', a selectable 
marker . 

3. A prokaryotic cell, characterised in that it is 
transformed with an expression vector containing a DNA 
sequence coding for the polypeptides according to claim 

15 1, so as to allow said host cell to express the specific 
polypeptide which is coded in the chosen sequence. 

4. A process for the preparation of polypeptides 
containing the HCV NS3 serine protease domain sequence, 
characterised in that it comprises the following 

20 operations: 

- transformation of a prokaryotic host cell with an 
expression vector containing a DNA sequence coding for a 
polypeptide containing the HCV NS3 serine protease domain 
sequence ; 

25 - growth of the prokaryotic host cell on a special 

culture medium containing Zn 2 * or alternatively salts of 
transition metals such as Co, Cd, Mn, Cu, Ni, Ag, Fe, Cr, 
Hg, Au, Pt, V; 

- expression of the DNA sequence required to produce 
30 the chosen polypeptide; 

- purification of the polypeptide without having to 
resort to resolubilisation protocols, and without the 
need for renaturation of the protein from included 
bodies, 

35 said procedure making it possible to obtain said 

polypeptides in their native, soluble form suitable to 
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enable determination of the three-dimensional structure 
of the enzyme by means of NMR or X-ray crystallography 
techniques . 

5 A process for the renaturation an vitro of 
5 polypeptides containing the HCV NS3 serine protease 
domain sequence, characterised in that it comprises the 

following operations: 

- transformation of a prokaryotic host cell uBing an 
expression vector containing a DNA sequence coding for a 
polypeptide that contains the HCV NS3 serine protease 

domain sequence; 

- expression of the DNA sequence required to produce 

the chosen polypeptide; 

- purification of the denaturated polypeptide and 
renaturation of the protein using buffers containing Zn 2 * 
or alternatively salts of transition metals such as Co, 
Cd, Mn. Cu, Ni, Ag, Fe, Cr. Hg, Au, Pt, V, 
said procedure making it possible to obtain said 
polypeptides in their native, soluble form suitable to 
enable determination of the three-dimensional structure 
of the enzyme by means of NMR or X-ray crystallography 
techniques. 



15 
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• domain 2 



VglFRAA VCTRGVAKAV DF.VPVESME 
EGKFTAA RGSGGSVSQI RV.RPLVCAG 
VgMLVSV LHSGGRVTAA RFTRPWTQVP 
VgMLISV LHRGSRVSSV RYTKPWETLP 
VIGIITA GGEGLVAFSD IRDLYAYEVE 
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